Skip to content

feat: Support inline eval definitions#43

Merged
joshblack merged 6 commits into
mainfrom
copilot/define-evals-inline-in-experiment
Jun 30, 2026
Merged

feat: Support inline eval definitions#43
joshblack merged 6 commits into
mainfrom
copilot/define-evals-inline-in-experiment

Conversation

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Experiments can now reference evals defined outside the repository’s generated eval registry. Inline evals include a name, project-local directory, and optional config/test path overrides that resolve from the CLI working directory.

  • Experiment config

    • Allows evals entries to be either built-in eval IDs or inline eval objects.
    • Adds shared eval config types for prompt metadata.
  • Eval resolution

    • Resolves built-in eval IDs through the existing registry.
    • Resolves inline eval paths relative to process.cwd().
    • Supports inline config, configPath, and testPath.
    • Preserves sandbox spoofing behavior by normalizing inline evals into the same runtime shape as generated evals.
  • Validation + docs

    • Adds focused coverage for built-in lookup, cwd-relative inline paths, and custom config/test paths.
    • Documents inline eval usage.
export const experiment: ExperimentConfig = {
  name: 'Local project experiment',
  description: 'Run an eval from the current project',
  models: ['gpt-5.5'],
  evals: [
    {
      name: 'local-button-eval',
      path: './evals/button',
      config: {
        prompt: 'Update the local project to use a Primer button',
      },
      testPath: 'button.eval.test.ts',
    },
  ],
  treatments: [],
}

@joshblack joshblack left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot evals should use the same structure as what we have, folks shouldn't have to specify the test path or prompt inline it should live in the eval config in the eval itself

Copilot AI commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

@copilot evals should use the same structure as what we have, folks shouldn't have to specify the test path or prompt inline it should live in the eval config in the eval itself

Addressed in 4449adb by removing inline prompt/test path overrides and resolving inline evals from their eval.config.ts and eval.test.ts files.

Copilot AI requested a review from joshblack June 30, 2026 15:08
@joshblack

Copy link
Copy Markdown
Member

@copilot resolve the merge conflicts in this pull request

Copilot AI added 2 commits June 30, 2026 15:11
…inline-in-experiment

# Conflicts:
#	packages/experiment/src/config.ts

Copilot AI commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

Resolved the merge conflicts in d76ea88 and repaired the merged pnpm lockfile in ef8c140.

Comment thread README.md Outdated
models: ['gpt-5.5'],
evals: [
{
name: 'local-button-eval',

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot name should be optional. By default, let's use the directory name (e.g. ./evals/local-button-eval should have the default name local-button-eval)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5cdc3b5 by making inline eval name optional and defaulting it from the eval directory name.

Copilot AI requested a review from joshblack June 30, 2026 16:16
@joshblack joshblack marked this pull request as ready for review June 30, 2026 16:23
Copilot AI review requested due to automatic review settings June 30, 2026 16:23
@joshblack joshblack merged commit 5dc2ed6 into main Jun 30, 2026
7 checks passed
@joshblack joshblack deleted the copilot/define-evals-inline-in-experiment branch June 30, 2026 16:23
Copilot stopped reviewing on behalf of joshblack due to an error June 30, 2026 16:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds support for referencing “inline” evals (local directories) in experiment configs, while standardizing type exports and import specifiers across packages.

Changes:

  • Extend ExperimentConfig.evals to accept either built-in eval IDs or inline eval directory references.
  • Add eval resolution logic in agent-eval to load inline eval config/tests from disk, and wire it into the CLI.
  • Normalize internal imports by removing explicit .ts extensions.
Show a summary per file
File Description
packages/sandbox/src/index.ts Normalizes relative import specifiers (drops .ts).
packages/experiment/src/index.ts Re-exports new eval-related config types and normalizes import specifier.
packages/experiment/src/config.ts Introduces ExperimentEvalConfig and updates ExperimentConfig.evals type accordingly.
packages/evals/src/index.ts Normalizes generated module import specifiers (drops .ts).
packages/agent-eval/src/treatment.ts Switches treatments to reference ResolvedEval instead of Eval.
packages/agent-eval/src/eval.ts Adds resolver for built-in vs inline evals (filesystem validation + config import).
packages/agent-eval/src/eval.test.ts Adds Vitest coverage for built-in and inline eval resolution behavior.
packages/agent-eval/src/config.ts Reuses shared EvalConfig type from @primer/agent-experiment.
packages/agent-eval/src/cli.ts Resolves evals upfront (built-in or inline) and uses resolved eval objects when constructing treatments.
README.md Documents inline eval usage and expected file structure.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 10/10 changed files
  • Comments generated: 3
  • Review effort level: Low

Comment on lines +42 to +47
async function loadEvalConfig(configPath: string, name: string): Promise<EvalConfig> {
const configModule = (await import(configPath)) as {default?: unknown}
if (!isEvalConfig(configModule.default)) {
throw new Error(`Eval "${name}" config must export a default config with a prompt`)
}
return configModule.default
Comment on lines +28 to +46
async function assertDirectory(directory: string, name: string) {
const stats = await fs.stat(directory).catch(() => undefined)
if (!stats?.isDirectory()) {
throw new Error(`Eval "${name}" directory was not found: ${directory}`)
}
}

async function assertFile(filepath: string, name: string) {
const stats = await fs.stat(filepath).catch(() => undefined)
if (!stats?.isFile()) {
throw new Error(`Eval "${name}" test file was not found: ${filepath}`)
}
}

async function loadEvalConfig(configPath: string, name: string): Promise<EvalConfig> {
const configModule = (await import(configPath)) as {default?: unknown}
if (!isEvalConfig(configModule.default)) {
throw new Error(`Eval "${name}" config must export a default config with a prompt`)
}
Comment on lines +19 to +26
function isEvalConfig(value: unknown): value is EvalConfig {
return (
value !== null &&
typeof value === 'object' &&
'prompt' in value &&
typeof (value as Record<string, unknown>).prompt === 'string'
)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants